-
-
Notifications
You must be signed in to change notification settings - Fork 326
Introduce multi-period Account data type and use it for MultiBalanceReport and BudgetReport. #2360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
d8d6312
to
04cb729
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Initial comments.
hledger/test/balance/balance.test
Outdated
|
||
# ** 16. balance --flat --empty does not display accounts which have not been | ||
# seen, even if they're implied, but does show accounts that have been seen | ||
# with 0 balance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this means in userese ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means that if there have never been any postings to assets
, then we shouldn't display a value for the assets
account, even with --empty
. On the other hand, we should show assets:bank:checking
, since there have been postings to that account.
This was the case before, but turned out to be a non-trivial thing to maintain in the refactoring, so I added a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the test description. Let me know if it's clearer.
@@ -50,7 +50,6 @@ Budget performance in 2016-12-01..2016-12-03: | |||
|| 2016-12-01 2016-12-02 2016-12-03 | |||
==================++============================================================== | |||
assets:cash || $-10 [ 40% of $-25] $-14 [56% of $-25] $-51 [204% of $-25] | |||
expenses || $10 [ 40% of $25] $14 [56% of $25] $51 [204% of $25] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another behaviour change, worth noting with a ! in message.
The parent "expenses" account is not shown, because there's no explicit budget goal for it, and because we're in list mode ? So if we want to see aggregated budget performance, tree mode will be needed. Ok I guess.
Why is it still shown in the previous test ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me look into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the reason is as follows:
Unless called with -E
, then a budget report will count any unbudgeted subaccounts against their earliest budgeted parent. So both expenses:cab
and expenses:movies
are rolled up to expenses
. Even though expenses
doesn't have a budget itself, it gets the sum of the budgets of its subaccounts.
I'm not sure how I feel about this behaviour, but I think changing it is out of scope for this PR.
data AccountBalances a = AccountBalances { | ||
abhistorical :: a -- ^ historical balance information | ||
,abdatemap :: IM.IntMap a -- ^ balance information associated to a start day | ||
} deriving (Eq, Functor, Generic) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for using a type parameter ? Do we truly need it ?
How does the "start day" map, with Int keys I assume, work here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for using a type parameter ? Do we truly need it ?
For budget report, I guess. It hurts code comprehensability a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the main use is for the budget report. But I think it also simplifies things a bit by exposing the functor, foldable, and traversable interfaces for AccountBalances
, saving having to write out a lot of boilerplate to perform tasks that re-implement that functionality in a monomorphic container.
Could we quantify that a little more - eg "balance reports are 1% faster with 1k txns, 5% faster with 10k txns" ? |
Also I wonder if there's any memory impact, |
e8a2ca8
to
509e5bc
Compare
This upgrades Account to enable it to do the hard work in MultiBalanceReport, but does not use the new functionality just yet. It continues to function as before by only using the "abhistorical" value.
For use in budget reports.
Ensure that implied accounts with no postings are not displayed, but accounts with zero balance and actual postings are.
7223a5c
to
00e02a2
Compare
Rephrase everything in terms of boringness to make for a clearer logical flow.
This removes the type alias Account, and replaces it with the fully-qualified name Account AccountBalance. This breaks some backwards compatibility, but that was already broken by the change of Account type constructor in any case. This simplifies the interface.
Rename applyAccountBalance to mapAccountBalance.
mergeWithKey can create corrupt output if its inputs don't satisfy certain conditions. We restrict the domain here to only those cases where it is guaranteed safe. This still covers all the cases that we need.
This keeps Hledger.Data.AccountBalance and Hledger.Data.AccountBalances separate.
Here is the benchmarking. Marginal change for small journals, but about 5% time savings for the 100k journal, and roughly comparable for my real-life journal (21k transactions, 796 accounts of depth 7).
|
It looks like memory use is a bit higher with the new code. Lower heap use, but higher maximum residency.
|
I think I've responded to all comments. Let me know if you want to discuss further. |
accountMap = processPostings ps | ||
|
||
processPostings :: [Posting] -> HM.HashMap AccountName (AccountBalances AccountBalance) | ||
processPostings = foldl' (flip processAccountName) mempty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a question about whether we get better performance with this as foldl'
or foldr
. It seems that foldl'
is slightly faster, while foldr
has better memory usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the time and memory usage when using the foldr version. I wonder if foldr is the winner here.
Running 6 tests 5 times with 3 executables at 2025-04-25 19:44:53 AEST:
Best times:
+--------------------------------------------------------------------++------------------+------------------------+------------------------------+
| || ./hledger-master | ./hledger-multiaccount | ./hledger-multiaccount-foldr |
+====================================================================++==================+========================+==============================+
| -f examples/10ktxns-1kaccts.journal balance || 0.81 | 0.80 | 0.81 |
| -f examples/1ktxns-1kaccts.journal balance --weekly || 0.68 | 0.65 | 0.64 |
| -f examples/10ktxns-1kaccts.journal balance --weekly || 7.91 | 7.63 | 7.94 |
| -f examples/100ktxns-1kaccts.journal balance --yearly || 11.29 | 10.92 | 10.84 |
| balance --value=end @/home/myname/expenses-report.args || 1.39 | 1.32 | 1.32 |
| balance --layout=tidy --daily @/home/myname/assetsliabilities.args || 1.49 | 1.46 | 1.46 |
+--------------------------------------------------------------------++------------------+------------------------+------------------------------+
$ ./hledger-multiaccount-foldr-prof -f examples/100ktxns-1kaccts.journal balance --yearly +RTS -s > /dev/null
87,949,386,664 bytes allocated in the heap
8,323,439,592 bytes copied during GC
558,243,136 bytes maximum residency (15 sample(s))
13,941,192 bytes maximum slop
1551 MiB total memory in use (0 MiB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 21167 colls, 0 par 2.663s 2.680s 0.0001s 0.0100s
Gen 1 15 colls, 0 par 2.385s 2.394s 0.1596s 0.5332s
TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.002s ( 0.002s elapsed)
MUT time 56.837s ( 56.914s elapsed)
GC time 5.048s ( 5.074s elapsed)
RP time 0.000s ( 0.000s elapsed)
PROF time 0.000s ( 0.000s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 61.888s ( 61.990s elapsed)
Alloc rate 1,547,396,211 bytes per MUT second
Productivity 91.8% of total user, 91.8% of total elapsed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think I messed up the memory analysis by using profiled versions of the executables. The answer is less dramatic for the normal versions. I've included them here.
$ ./hledger-master -f examples/100ktxns-1kaccts.journal balance --yearly +RTS -s
43,526,689,224 bytes allocated in the heap
4,818,585,000 bytes copied during GC
376,952,024 bytes maximum residency (13 sample(s))
3,414,824 bytes maximum slop
1041 MiB total memory in use (0 MiB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 10559 colls, 0 par 1.651s 1.660s 0.0002s 0.0043s
Gen 1 13 colls, 0 par 1.711s 1.718s 0.1322s 0.3766s
TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.002s ( 0.001s elapsed)
MUT time 7.573s ( 7.582s elapsed)
GC time 3.362s ( 3.378s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 10.937s ( 10.961s elapsed)
Alloc rate 5,747,632,285 bytes per MUT second
Productivity 69.2% of total user, 69.2% of total elapsed
$ ./hledger-multiaccount -f examples/100ktxns-1kaccts.journal balance --yearly +RTS -s
41,742,275,096 bytes allocated in the heap
4,600,175,152 bytes copied during GC
388,933,464 bytes maximum residency (13 sample(s))
2,396,632 bytes maximum slop
999 MiB total memory in use (0 MiB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 10128 colls, 0 par 1.628s 1.637s 0.0002s 0.0034s
Gen 1 13 colls, 0 par 1.494s 1.499s 0.1153s 0.2873s
TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.000s ( 0.000s elapsed)
MUT time 7.544s ( 7.561s elapsed)
GC time 3.122s ( 3.136s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 10.667s ( 10.698s elapsed)
Alloc rate 5,533,021,721 bytes per MUT second
Productivity 70.7% of total user, 70.7% of total elapsed
$ ./hledger-multiaccount-foldr -f examples/100ktxns-1kaccts.journal balance --yearly +RTS -s
41,751,866,552 bytes allocated in the heap
4,611,401,952 bytes copied during GC
381,899,736 bytes maximum residency (13 sample(s))
3,317,352 bytes maximum slop
1014 MiB total memory in use (0 MiB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 10129 colls, 0 par 1.528s 1.537s 0.0002s 0.0028s
Gen 1 13 colls, 0 par 1.503s 1.507s 0.1159s 0.3262s
TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.000s ( 0.000s elapsed)
MUT time 7.356s ( 7.368s elapsed)
GC time 3.031s ( 3.044s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 10.387s ( 10.412s elapsed)
Alloc rate 5,676,066,670 bytes per MUT second
Productivity 70.8% of total user, 70.8% of total elapsed
This rejigs the
MultiBalanceReport
internals to use an enhancedAccount
data type to save the values. This has a few effects:Account
means thatBudgetReport
can be simplified.There are some small changes in behaviour with respect to budget reports, where it looked like some behaviour was implemented to work around needing to get the budget and actuals into the same shape so they could be merged. This is no longer necessary, but may still be desired for other reasons.
Let me know your thoughts.